Download Shifted NMF with Group Sparsity for Clustering NMF Basis Functions
Recently, Non-negative Matrix Factorisation (NMF) has found application in separation of individual sound sources. NMF decomposes the spectrogram of an audio mixture into an additive parts based representation where the parts typically correspond to individual notes or chords. However, there is a need to cluster the NMF basis functions to their sources. Although, many attempts have been made to improve the clustering of the basis functions to sources, much research is still required in this area. Recently, Shifted Non-negative Matrix Factorisation (SNMF) was used to cluster these basis functions. To this end, we propose that the incorporation of group sparsity to the Shifted NMF based methods may benefit the clustering algorithms. We have tested this on SNMF algorithms with improved separation quality. Results show that this gives improved clustering of pitched basis functions over previous methods.
Download On the use of Masking Filters in Sound Source Separation
Many sound source separation algorithms, such as NMF and related approaches, disregard phase information and operate only on magnitude or power spectrograms. In this context, generalised Wiener filters have been widely used to generate masks which are applied to the original complex-valued spectrogram before inversion to the time domain, as these masks have been shown to give good results. However, these masks may not be optimal from a perceptual point of view. To this end, we propose new families of masks and compare their performance to generalised Wiener filter masks using three different factorisation-based separation algorithms. Further, to-date no analysis of how the performance of masking varies with the number of iterations performed when estimating the separated sources. We perform such an analysis and show that when using these masks, running to convergence may not be required in order to obtain good separation performance.
Download Simulation of Textured Audio Harmonics Using Random Fractal Phaselets
We present a method of simulating audio signals using the principles of random fractal geometry which, in the context of this paper, is concerned with the analysis of statistically self-affine ‘phaselets’. The approach is used to generate audio signals that are characterised by texture and timbre through the Fractal Dimension such as those associated with bowed stringed instruments. The paper provides a short overview on potential simulation methods using Artificial Neural Networks and Evolutionary Computing and on the problems associated with using a deterministic approach based on solutions to the acoustic wave equation. This serves to quantify the origins of the ‘noise’ associated with multiple scattering events that characterise texture and timbre in an audio signal. We then explore a method to compute the phaselet of a phase signal which is the primary phase function from which a phase signal is, to a good approximation, a periodic replica and show that, by modelling the phaselet as a random fractal signal, it can be characterised by the Fractal Dimension. The Fractal Dimension is then used to synthesise a phaselet from which the phase function is computed through multiple concatenations of the phaselet. The paper provides details of the principal steps associated with the method considered and examines some example results, providing a URL to m-coded functions for interested readers to repeat the results obtained and develop the algorithms further.
Download Harmonic-percussive Sound Separation Using Rhythmic Information from Non-negative Matrix Factorization in Single-channel Music Recordings
This paper proposes a novel method for separating harmonic and percussive sounds in single-channel music recordings. Standard non-negative matrix factorization (NMF) is used to obtain the activations of the most representative patterns active in the mixture. The basic idea is to classify automatically those activations that exhibit rhythmic and non-rhythmic patterns. We assume that percussive sounds are modeled by those activations that exhibit a rhythmic pattern. However, harmonic and vocal sounds are modeled by those activations that exhibit a less rhythmic pattern. The classification of the harmonic or percussive NMF activations is performed using a recursive process based on successive correlations applied to the activations. Specifically, promising results are obtained when a sound is classified as percussive through the identification of a set of peaks in the output of the fourth correlation. The reason is because harmonic sounds tend to be represented by one valley in a half-cycle waveform at the output of the fourth correlation. Evaluation shows that the proposed method provides competitive results compared to other reference state-of-the-art methods. Some audio examples are available to illustrate the separation performance of the proposed method.